List of Web archiving initiatives

This page contains a list of Web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data and access methods.

Contents

Web archiving initiatives

Name Country Creation Year Technologies Number of Employees Comments
Full-time Part-time
Australia's Web Archive[1] Australia 1996 PANDORA Digital Archiving System (PANDAS), NLA Trove, HTTrack. 4 >4.25 It is a collaborative program of 11 agencies that provide an estimate average monthly staffing equivalent to 4 FTE. IT outsourced support: 0.25 person-month. Whole Domain Harvests are conducted by the Internet Archive using Heritrix, Wayback Machine.
Our digital island, a Tasmanian Web Archive[2] Australia 1996 HTTrack, Experimentally: Web Curator, Heritrix and Wayback Machine 1
PageFreezer [3] Canada, US, Netherlands, Belgium 2005 PageFreezer's Deep Web Crawler, Lucene, Solr Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws.
Web@rchive Austria[4] Austria 2008 Archive-access tools and NetarchiveSuite.dk 2
DILIMAG (Digital Literature Magazines)[5] Austria 2007 WebCurator 2 One technician, one for collecting and metadata.
Government of Canada Web Archive (GCWA)[6] Canada 2005 Heritrix, Wayback Machine and Nutchwax. 2
Web Information Collection and Preservation - WICP (Chinese Web Archive)[7] China 2003 Heritrix, Wayback Machine and Nutchwax.
Croatian Web Archive (Hrvatski arhiv weba - HAW)[8] Croatia 2004 Lucene 4 3 2 librarians full time, 2 librarians part time, 1 IT professional (National and University Library in Zagreb), 1 or 2 IT professionals (from Zagreb University Computing Centre (Srce)- our partner)
WebArchiv (National Library of the Czech Republic)[9] Czech Republic 2000 Nutch, NutchWAX and WERA tools. 5 3.5 FTE library staff + approx. 1.5 FTE technical staff
Netarkivet.dk[10] Denmark 2005 NetarchiveSuite.dk and Heritrix. 18 18 people involved (developers, librarians, operations staff, project managers). All together 5 FTE.
Finnish Web Archive[11] Finland 2008 NutchWAX 2 >2 Group of librarians that in part-time select what to archive from the Finnish web space.
BnF - BnF Web Legal Deposit[12] France 2006 Heritrix, Wayback Machine and NutchWAX. NetarchiveSuite. 9
Ina (Institut National de l'Audiovisuel)[13] France 2009 Crawl : PhagoSite, Croket, Heritrix / Access : Dowser 6 Staff of 80 documentalists taking part in nominating sites and QA
E-diaspora (Télécom ParisTech, FMSH)[14] France 2010 Crawl : PhagoSite 1 30 researchers taking part in nominating sites
Internet Memory Foundation (ATN service)[15] France, Netherlands 2004 IM large scale crawler (under development), Heritrix, Hanzo's crawler, IM Access software. Storage of Web Content: Hbase 21 0 11 people for quality crawls (QA, crawl engineering, project management), 9 developers & infrastructure, 1 manager.
Bibliotheksservice-Zentrum Baden-Württemberg[16] Germany 2003 7.5
Web archive of the German Bundestag[17] Germany 2005
Iceland[18] Iceland 2004 Heritrix, Wayback Machine
Japan Web Archiving Project[19] Japan 2004 Heritrix, Solr. Previously: Wget, Accela BizSearch 10 2 Launched in April 2004 as a pilot project, WARP (Web Archiving Project) has been in full-scale operation since July 2007.[20]
National Library of Korea - OASIS (Online Archiving & Searching Internet Sources)[21] Korea 2001 Own system based on Oracle DBMS and specialized search engine (IRS) that performs data management and search function. 3 11
Koninklijke Bibliotheek[22] Netherlands 2006 Heritrix, KB e-Depot system 1 ~7
National Library of Latvia[23] Latvia 2005 Heritrix 1 Currently only storing for preservation, access to public in development (ETA June 2012). The latvian term for web harvesting is "rasmošana".
New Zealand Web Archive[24] New Zealand 1999 Wayback Machine 3 >10 3-4 people at the National Library (various hours) and 2 people at the Internet Archive during the time of domain harvests.
Selective web archiving = 3 full time staff.
Technical services = 1 staff member responds to technical problems when they arise.
National Digital library = 2-3 staff members ad hoc.
NDHA (National Digital Heritage Archive) = various staff members respond to web archiving issues as they arise.
The National Library of Norway[25] Norway
Portuguese Web Archive[26] Portugal 2007 Heritrix, Wayback Machine, NutchWAX 4 1
Web archive of Čačak[27] Serbia 2009 HTTrack 1
Web Archive Singapore[28] Singapore Wayback Machine, Heritrix, NutchWAX, WERA
Slovenian Web Archive[29] Slovenia 2007 Heritrix, Wayback Machine 1
Digital Preservation of .ES domain[30] Spain 2006 Internet Archive 2 >2 Can pool additional resources if necessary from computing controllers and financial department.
Digital Heritage of Catalonia[31] Spain 2006 Heritrix, Wayback Machine, WERA, Nutchwax and Web Curator. 4
Basque Digital Heritage Archive[32] Spain 2008 Heritrix, Wayback Machine, Nutchwax and Web Curator. 1
Sweden (Kulturarw3)[33] Sweden 1996 Heritrix. Own system for storage, maintenance and access 1.25 Paus in operation november 2009 - may 2011.
Aleph Archives[34] Switzerland/USA 2010 Distributed crawler, ArchiView access plugin, High performance search engine, Near real time indexing, Web Monitoring tools 7 Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their web contents regardless of their types (websites, wikis, social media, forums...).
Web Archive Switzerland[35] Switzerland 2008 Heritrix, Wayback Machine 3 1 crawl engineer, 1 person for quality assurance, 1 coordinator. The curators, who do the selection, are partner libraries all over Switzerland.
NTU Web Archiving System, NTUWAS[36] Taiwan 2007 Lucene 3
Web Archive Taiwan[37] Taiwan 2007
The UK Web Archive[38] UK 2004 Heritrix, Web Curator Tool, Wayback Machine and moving to Solr for searching.
Hanzo Archives[39] UK 2006 Hanzo Crawler, Search, and Access Tools. Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive[40] UK 2004 ATN Service 4 2 Technical side of our web archiving operation is contracted out to the Internet Memory Foundation so the figures account for QA, curatorial and management staff only
Internet Archive (provides Archive-it service)[41] USA 1996 Heritrix, Wayback Machine, NutchWAX and other tools developed by the Internet Archive 12
Reed Technology Web Archiving Services[42] USA 2010 TrueArchive™ Technology Reed Technology Web Archiving Services provides support for Litigation Protection, Compliance, e-Discovery and Social Media Management.
Columbia University Libraries Web Resources Collection Program[43] USA 2009 Archive-it service 3 >1 Part-time consultation/supervision from other librarians adding up to about 1 FTE.
North Carolina State Government Web Site Archives[44] USA 2005 Archive-it service 3
Latin American Web Archiving Project[45] USA 2005 Archive-it service
Web Archiving Project for the Pacific Islands[46] USA Archive-it service 4
Library of Congress Web Archives[47] USA 2000 Heritrix, Wayback Machine, and the DigiBoard, an in-house curatorial/permissions tool 6 80 The part time workers spend a few hours per month (on average) selecting content for the collections.
Harvard University Library: the Web Archive Collection Service (WAX)[48] USA 2006 Own system based on Archive-access and other open-source tools. >6 3 part time on IT support. External curators within 3 units but don't know the size of them.
Web Archiving Service from California Digital Library (WAS service)[49] USA 2005 Heritix, Wayback Machine, NutchWAX 4 >1 The number of hours that curators devote to the service is very variable.
University of Michigan Web Archives Project[50] USA 2000 WAS service 2
University of Texas at San Antonio Web Archives[51] USA 2009 Archive-It 3 The number of hours varies dependent upon how the crawls are scheduled.
qumram[52] Switzerland 2010 Chronos Web Archiving Software Suite Commercial web archiving software suite. Provides both harvesting as well as transactional web archiving. Allows integrations with any possible repository (database, file system, electronic archive or records management system). Specializes on regulatory compliance.
SAPERION[53] Germany 2011 SAPERION ECM Web Content Archive Commercial enterprise content management suite specializes on regulatory compliance. The product provides both harvesting as well as transactional web archiving based on the integration of qumram´s[52] Chronos Web Archiving Software Suite. Web content is just another chanel from which content is reaching SAPERION. Others may be scanner, fax, e-mail, mobiles devices, office suites or any other system creating content like ERP systems.
Bibliotheca Alexandrina's Internet Archive Egypt 2002 Heritrix, Wayback Machine 3 Current crawling interests: Egypt beyond January 25, Arab League ccTLDs

Archived data

Name Archived Contents (millions) Disk Space Occupied (TB) Archive Format TLD/Broad Crawls Selective Crawls (Yes/No) Comments
Australia's Web Archive[1] 3100 104.5 ARC/WARC .AU Y .AU crawls (2005-2009): 3 billion files (100 TB). Selective crawls (1996-today): 100 million files (4.5 TB). There are 3 copies of each content.
Our digital island, a Tasmanian Web Archive[2] 0.336 HTTrack Y Preserves online contents related to Tasmania. ODI has operated since its inception under the assumption that web sites fall within the definition of ‘Book’ in the Tasmanian Library Act 1984.[54] Thus, no permission to capture from publishers is required.
Web@rchive Austria[4] 455 6.61 ARC .AT Y A copy of the data will be stored in a high security data storage unit.
DILIMAG (Digital Literature Magazines)[5] 0.03 0.996 ARC Project from 2007-03-01 until 2010-12-23. The project DILIMAG for collecting, describing and archiving of digital German literary magazines.
Government of Canada Web Archive (GCWA)[6] 170 7 Y Selective crawls of the web domain of the Federal Government of Canada (.GC.CA)
Web Information Collection and Preservation - WICP (Chinese Web Archive)[7] .GOV.CN Y Harvest of the web pages about the events that have great influence on the society, economy and so on, and the sites in 'gov.cn'

domain.

Croatian Web Archive (Hrvatski arhiv weba - HAW)[8] 81 3.4 Y
WebArchiv (National Library of the Czech Republic)[9] 526 24 .CZ Y Harvesting began in 2001.
Netarkivet.dk[10] 6008 190 ARC/WARC .DK Y It uses NetarchiveSuite.dk was developed by two Danish libraries and Heritrix.
Finnish Web Archive[11] 494 23 .FI, .AX Y Also crawls contents hosted on machines physically located in Finland, independently from their domain.
BnF - BnF Web Legal Deposit[12] 14000 200 ARC/WARC .FR Y
Ina (Institut National de l'Audiovisuel)[13] 8400 56 DAFF N Y Digital Archive file format handles file redundancies. The size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be 665 Tb
E-diaspora (Télécom ParisTech, FMSH)[14] 237 2 DAFF N N Digital Archive file format handles file redundancies.The size on disk takes into account compression and deduplication ; the equivalent disk storage in compressed ARC format would be 10 Tb
Internet Memory Foundation (ATN service)[15] 180 WARC Can be done by partners Y Formerly European Archive.[55] Provides the Archive The Net Service (ATN Service). Selective crawls (140 TB), Domain crawls (40 TB), expect to grow to 1PB in 2011. New datacenter and a new crawler in 2011.
Bibliotheksservice-Zentrum Baden-Württemberg[16] 1 HTTrack Y Bibliotheksservice-Zentrum Baden-Württemberg -German is operating following Web-Archives:
1- Baden-Württembergisches Online-Archiv (BOA)
2- Saardok
3- Literatur im Netz des Deutschen Literaturarchivs Marbach.[56]
Web archive of the German Bundestag[17] Y German Federal Parliament. Selective. At regular intervals or at certain events are snapshots (snapshots) of www.bundestag.de and other web presences of the German Bundestag made. These are available in the web archive to date available.
Iceland[18]
Japan Web Archiving Project[19] 319.8 38.2 WARC - Y 15 TB of selective crawls based on permission (2002–2010). Started the web archiving of official institution sites based on the legislation from April 2010.
National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)[21] 24 Y Requires consent before archiving. Targets 56,401 Websites. Web archiving is managed under Digital resource management systems. In 2011 web arching system will be rebuild.
Koninklijke Bibliotheek[22] 5 ARC Y
New Zealand Web Archive[24] 346 13 .NZ Y .NZ crawls: 105 million URLs (4.1 TB) in 2008, 170 million URLs (6.1 TB) in 2010. Selective crawls of 7 599 websites in the National Digital Heritage Archive (2.8 TB), 71 million contents estimated. Legal deposit covers born digital material (including websites).
The National Library of Norway[25]
Portuguese Web Archive[26] 889 25 ARC .PT, .CV, .AO, .MZ Y TLD crawls and integration of external collections since 2007, selective crawls since 2010.
Web archive of Čačak[27] 0.255 0.013 HTTrack Y Selective crawls of 130 sites related to the city of Čačak. Collaboration with the WebArchiv team from the National Library of the Czech Republic.
Web Archive Singapore[28] .SG Y Selective crawls of 1000 Singapore-related sites, with the written consent of the owners. Whole .SG domain archiving.
Slovenian Web Archive[29] 1.5 WARC Selective crawls
Digital Preservation of .ES domain[30] 855 30 ARC .ES Collaboration with Internet Archive. Domain crawl of .ES, harvested quarterly. Not launched publicly yet.
Digital Heritage of Catalonia[31] 200 7.7 ARC .CAT Y In accordance with the general trend, the archive model is a hybrid system consisting: Mass compilation of open-access digital resources published on the Internet (.cat); Systematic archiving of the web site output of Catalan organizations; Fostering of lines of research through themed integration of the digital resources pertaining to specific events in Catalan public life (elections, museums, etc.)
Basque Digital Heritage Archive[32] 21 0.8 ARC Y
Sweden (Kulturarw3)[33] 1710 71.3 Multipart MIME .se, Swedish .nu and geolocation for other tld's Y Bulk crawls approximately twice a year.
Selective crawls of about 140 newspapers every day.
Aleph Archives[34] 23 WARC, WARC2, ARC and HTTrack to WARC migration tools Y Enterprise-grade Web archiving platform for online heritage (content, brands) preservation and eDiscovery aimed to corporates, institutions, legal and government industries seeking to preserve their web contents regardless of their types (websites, wikis, social media, forums...).
Web Archive Switzerland[35] 0.1 ARC Y
NTU Web Archiving System, NTUWAS[36] 200 14 Y
Web Archive Taiwan[37]
The UK Web Archive[38] 6.9 ARC Y Selective crawls with previous permission. Expect to run wholesale UK domain-scale crawls once Legal Deposit legislation is implemented in April 2011. The UKWA is a spin-off from the UK Web Archiving Consortium that ended in 2007.
Hanzo Archives[39] 7 WARC Y Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive[40] 32 ARC The UKGWA is a spin-off from the UK Web Archiving Consortium that ended in 2007.
Internet Archive (provides Archive-it service)[41] 150000 5500 World-wide Y Provides the Archive-it service and leads the Archive-access project (Internet Archive ARC access tools). Collection is mirrored at Bibliotheca of Alexandrina in Egypt.
Reed Technology Web Archiving Services[42]
Columbia University Libraries Web Resources Collection Program[43] 23.1 1.8 ARC/WARC Y Selective crawls with permission or notification; primarily thematic collections.
North Carolina State Government Web Site Archives[44] 51.5 3.8 WARC Y
Latin American Web Archiving Project[45] Y
Web Archiving Project for the Pacific Islands[46] 5.5 ARC/WARC Y Includes sites of 18 countries.
Library of Congress Web Archives[47] 5 230 ARC/WARC Y Formerly MINERVA. Selective crawls with notification and permission; primarily event and thematic collections.
Harvard University Library: the Web Archive Collection Service (WAX)[48] 19 0.661 ARC Y Selective crawls with no previous authorization.
Web Archiving Service from California Digital Library (WAS service)[49] 216 25.2 ARC/WARC Can be done by partners Y Provides Web Archiving Service (WAS) to partners world-wide. Was developed at the California Digital Library.
University of Michigan Web Archives Project[50] 0.65 ARC/WARC Y WAS service since 2010.
University of Texas at San Antonio Web Archives[51] 26 1.135 ARC/WARC Y University administration, faculty and student sites; as well as selective captures on San Antonio and South Texas subject areas, including San Antonio organizations; San Antonio Online Journals and Blogs; Tejano and Conjunto music; Gay, Lesbian, Bisexual, Transgender and Queer Related Web sites in Texas, San Antonio and the Rio Grande Valley; Immigration/Borderlands; Mexican Cooking Blogs; San Antonio Restaurants; Renewable Energy in Texas; Rio Grande Valley Organizations; and Rio Grande Watershed and Texas Water Issues .

Access methods

Name URL history (Yes/No) Meta-data (catalog/advanced) search (Yes/No) Full-text search (Yes/No) Comments
Australia's Web Archive[1] N Y Y Selected sites are publicly available through a directory structure. Domain harvests are not. The PANDORA Archive is indexed and searchable through the NLA's single search service Trove.[57]
The Australian Domain Harvests are full-text indexed but are not currently publicly available.
Our digital island, a Tasmanian Web Archive[2] Y Y N Presents thumbnails generated through Html To Image supplemented in HTTrack. Information is organized in directory: A-Z Subject listing, A-Z Title listing.
Web@rchive Austria[4] Y N N Only accessible on special terminals at the Austrian National Library. Presents thumbnail previews of archived pages and supports keyword search within URL.
DILIMAG (Digital Literature Magazines)[5] Y Y N Metadata are publicly available, for the archived versions provides free or restricted access depending on the right holders agreement. Full-text search was not implemented due to lack of resources.
Government of Canada Web Archive (GCWA)[6] Y Y Y Technical details available.[58]
Web Information Collection and Preservation - WICP (Chinese Web Archive)[7] Y Archive content is only available in intranet in National Library of China. Some collections are publicly available, with meta-data search and browsable by collection.
Croatian Web Archive (Hrvatski arhiv weba - HAW)[8] Y Y Y
WebArchiv (National Library of the Czech Republic)[9] Y Y Due to copyright restrictions, only a limited number of archived websites for which agreements were signed with the publishers is available online. For other resources you can find out whether a given website was archived and the number of harvested versions. Unlimited access to all resources in WebArchiv is available from public terminals in the National Library.
Netarkivet.dk[10] Y N N Online access granted only to researchers using a proxy solution that accesses an archive index. Soon it will set up user access through the Wayback Machine. It has established a framework for running batch jobs with the possibility of data mining.
Finnish Web Archive[11] Y N 30% of material. URL search but onsite access to contents. Full-text search is available to 30% of material.
BnF - BnF Web Legal Deposit[12] Y N 15% of the collection Accessible to authorized users of the BnF, through the reading rooms of the Research Library located in Paris and Avignon. Wayback Machine interface was translated to French. Full Text search only for a relatively small portion of the collection (15% of 200 TB) indexed by Internet Archive. No current full text search implemented in workflow. Builds special collection galleries based on a selection from the archive on a given topic.
Ina (Institut National de l'Audiovisuel)[13] Y Y Y Full text indexing is based on Lucene. To accommodate results from frequent crawls (up to every 2 hours for home pages) clustering is operated to handle similar versions of pages
E-diaspora (Télécom ParisTech, FMSH)[14] Y N N 1381 sites are currently crawled to build an archive on migrants usage of the web, social studies researchers have launched a long run project based on this archive (http://ediasporas.ticmigrations.fr/) Ina is hanling crawls and storage
Internet Memory Foundation (ATN service)[15] Y Y Y Provides access and search services according to partners policy.
Bibliotheksservice-Zentrum Baden-Württemberg[16] Y Y Y Search available (on development).[59]
Web archive of the German Bundestag[17] Y N N Web archive itself are snapshots of www.bundestag.de and other websites. Navigation is possible by clicking on the years.[60]
Iceland[18]
Japan Web Archiving Project[19] Y Y Y Public access to sites after permission of the site owners. Open access to important publications such as white papers.
National Library of Korea - OASIS (Online Archiving & Searching Internet Resource)[21] Y Y Y 100% of the archive is indexed. Enables search by topic classification (e.g. Religion, Science, Arts). Search available.[61]
Koninklijke Bibliotheek[22] The web archive will become available online during the first half of the year 2010.
New Zealand Web Archive[24] Y Y N Domain harvests are available to selected staff only using Wayback and limited to URL searchers. Selected harvestings, each website is described in the catalogue (providing subject, author, title and URL searches) and can be viewed by the public via the Internet by clicking on the link to the archived copy. The websites themselves however are not indexed.
The National Library of Norway[25] N Y Sites are integrated in the Catalog. Left bar enables facet navigation with drill-down.[62]
Portuguese Web Archive[26] Y Y Y 20% of the archive is indexed and na experimental full-text service is available. Archived data can be mined through an Hadoop platform.
Web archive of Čačak[27] N N N Plans to develop a search engine in the future. One bad characteristic of HTTrack is that it renames files during the archiving, so the original structure of the website is lost, as well file names.
Web Archive Singapore[28]
Slovenian Web Archive[29] Y N N The archive is not public yet. Plans to implement full-text search.
Digital Preservation of .ES domain[30] Y (Future) Y (Future) Plan to grant access through computers available at a given hall.
Digital Heritage of Catalonia[31] Y Y Y Full open access.
Basque Digital Heritage Archive[32] Y Y Y
Sweden (Kulturarw3)[33] Y N N Public access through dedicated machines in the library building.
Aleph Archives[34] Y Y Y The full text search engine support automatic metadata extraction, and native results deduplication. Also included: antivirus checker (~250mil. pages/day), archives statistics , text summarizer, archives exports (PDF, PNG, TIFF), etc.
Web Archive Switzerland[35] Y (in 2011) Y (in 2011) The archived versions of the sites are not yet accessible. Web Archive Switzerland will be open to the public by spring 2011 - only access within the National Library and the partner libraries will be possible. The sites are being catalogued and the records are integrated in our library catalog Helveticat.[63]
NTU Web Archiving System, NTUWAS[36] Y Y Y Presents page thumbnails, archived pages mapped to geographical locations.
Web Archive Taiwan[37] Y Y Y
PageFreezer [3] Y Y Y Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws. Used by government agencies and public listed corporations in Pharmaceutical, Food, Finance, Healthcare and Retail industry.
The UK Web Archive[38] Y Y N
Hanzo Archives[39] Y Y Y Commercial web archiving services and appliances. Access includes full-text search, annotations, redaction, URL/History, archive policy and temporal browsing, and configurable metadata schema for advanced e-discovery applications. Used in government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA.
UK Government Web Archive[40] Y Y Y Full text search is operational on the UK Government Web Archive.[64] Users can browse the collection using a full A-Z list of all sites[65] and a set of categories.[66]
Internet Archive (provides Archive-it service)[41] Y Y Y URL history is available for all archived data. Meta-data and full-text search only for selected crawls. Until 2002 had a mining platform for research composed by Alexa Shell Perl Tools

av_tools and p2 platform for parallel processing.[67] It was replaced by a simpler access and direct method that enables automatic access to files but no platform for processing.[68]

Reed Technology Web Archiving Services[42]
Columbia University Libraries Web Resources Collection Program[43] Y Y Y Accessible through Archive-it service.[69]
North Carolina State Government Web Site Archives[44] Y Y Y Accessible through Archive-it service.[69]
Latin American Web Archiving Project[45] Y Y Y Content can be accessed via full-text search, or by browsing by country or by specialized sample collection.
Web Archiving Project for the Pacific Islands[46] Y Y Y Supported by Archive-it service.
Library of Congress Web Archives[47] Y Y N Access provided via http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html. Records in MODS (Metadata Object Descriptive Schema) format.
Harvard University Library: the Web Archive Collection Service (WAX)[48] Y Y Y
Web Archiving Service from California Digital Library (WAS service)[49] Y Y Y Access for private study, scholarship and research. Most archives built with WAS have not yet been published because it is up to the partners to decide if they want to provide access. There are 16 partners using the service and they have created over 80 web archives, only 30 are publicly accessible. NutchWAX performance did not permit full archive search. Upcoming transition to SOLR will permit both full archive and collection-specific full text search.
University of Michigan Web Archives Project[50] Y Y Y Powered by the WAS from the California Digital Library.[70] Access is public but usage is restricted for private study, scholarship and research.
University of Texas at San Antonio Web Archives[51] Y Y Y Accessible through Archive-it service[71] and the Texas Archival Repositories Online database[72]

References

  1. ^ a b c http://pandora.nla.gov.au/
  2. ^ a b c http://odi.statelibrary.tas.gov.au/
  3. ^ a b http://www.pagefreezer.com/
  4. ^ a b c http://www.onb.ac.at/ev/about/webarchive.htm
  5. ^ a b c http://dilimag.literature.at/
  6. ^ a b c http://www.collectionscanada.gc.ca/index-e.html
  7. ^ a b c http://210.82.118.162:9090/webarchive
  8. ^ a b c http://haw.nsk.hr/
  9. ^ a b c http://en.webarchiv.cz/
  10. ^ a b c http://netarkivet.dk
  11. ^ a b c http://verkkoarkisto.kansalliskirjasto.fi
  12. ^ a b c http://www.Ina.fr
  13. ^ a b c [1]
  14. ^ a b c http://internetmemory.org
  15. ^ a b c http://www.bsz-bw.de/index.html
  16. ^ a b c http://webarchiv.bundestag.de/cgi/kurz.php
  17. ^ a b c http://vefsafn.is/index.php?page=english
  18. ^ a b c http://warp.da.ndl.go.jp
  19. ^ http://www.ndl.go.jp/en/cdnlao/meetings/pdf/report_Japan1_doc2.pdf
  20. ^ a b c http://www.oasis.go.kr/intro_new/intro_overview_e.jsp
  21. ^ a b c http://www.kb.nl/hrd/dd/dd_projecten/webarchivering/index-en.html
  22. ^ http://www.lnb.lv/lv/par-lnb/struktura/bibliografijas-instituts
  23. ^ a b c http://www.natlib.govt.nz/collections/a-z-of-all-collections/nz-web-archive
  24. ^ a b c http://www.nb.no/
  25. ^ a b c http://www.archive.pt
  26. ^ a b c http://digital.cacak-dis.rs/english/web-archive-of-cacak/
  27. ^ a b c http://was.nl.sg/
  28. ^ a b c http://www.zal-lj.si/
  29. ^ a b c http://www.bne.es/es/LaBNE/PreservacionDominioES/
  30. ^ a b c http://www.padicat.cat/
  31. ^ a b c http://www.ondarenet.kultura.ejgv.euskadi.net/
  32. ^ a b c http://www.kb.se/english/find/internet/websites/
  33. ^ a b c http://aleph-archives.com/
  34. ^ a b c http://www.nb.admin.ch/nb_professionnel/01693/index.html?lang=en
  35. ^ a b c http://webarchive.lib.ntu.edu.tw/eng/default.asp
  36. ^ a b c http://webarchive.ncl.edu.tw/nclwa98Front/
  37. ^ a b c http://www.webarchive.org.uk/ukwa/
  38. ^ a b c http://www.hanzoarchives.com/
  39. ^ a b c http://www.nationalarchives.gov.uk/webarchive/
  40. ^ a b c http://www.archive.org
  41. ^ a b c http://www.reedtechwebarchiving.com/
  42. ^ a b c https://www1.columbia.edu/sec/cu/libraries/bts/web_resource_collection/index.html
  43. ^ a b c http://webarchives.ncdcr.gov/
  44. ^ a b c http://lanic.utexas.edu/project/archives/
  45. ^ a b c http://library.manoa.hawaii.edu/research/archiveit/
  46. ^ a b c http://www.loc.gov/webarchiving/
  47. ^ a b c http://wax.lib.harvard.edu/collections/home.do
  48. ^ a b c http://webarchives.cdlib.org/
  49. ^ a b c http://bentley.umich.edu/uarphome/webarchives/webarchive.php
  50. ^ a b c http://www.archive-it.org/public/partner.html?id=318
  51. ^ a b http://www.qumram.ch/en
  52. ^ http://www.saperion.com
  53. ^ http://www.statelibrary.tas.gov.au/collections/taho/legaldeposit
  54. ^ European Archive
  55. ^ http://la.boa-bw.de/
  56. ^ http://trove.nla.gov.au/website?q=
  57. ^ http://www.collectionscanada.gc.ca/webarchives/technical-details/index-e.html
  58. ^ http://la.boa-bw.de/index.do
  59. ^ http://webarchiv.bundestag.de
  60. ^ http://www.oasis.go.kr/ctrlu?cmd=search-dbsite
  61. ^ http://www.nb.no/sok/search.jsf
  62. ^ http://www.helveticat.ch
  63. ^ http://collections.europarchive.org/tna/adv_search/?lang=en&query=&where=text&y=17&x=30
  64. ^ http://nationalarchives.gov.uk/webarchive/atoz.htm
  65. ^ http://nationalarchives.gov.uk/webarchive/
  66. ^ http://web.archive.org/web/20080511024512/www.archive.org/web/researcher/tool_documentation.php
  67. ^ http://www.archive.org/about/using.php
  68. ^ a b http://www.archive-it.org/public/partner.html?id=304
  69. ^ http://webarchives.cdlib.org/a/AlternativeMassMedia
  70. ^ http://www.archive-it.org/public/partner.html?id=304
  71. ^ http://www.lib.utexas.edu/taro/index.html4